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Abstract. Accurate spin tracking is a valuable tool for understanding spin 
dynamics in particle accelerators and can help improve the performance of an 
accelerator. In this paper, we present a detailed discussion of the integrators in 
the spin tracking code gpuSpinTrack. We have implemented orbital integrators 
based on drift-kick, bend-kick, and matrix-kick splits. On top of the orbital 
integrators, we have implemented various integrators for the spin motion. 
These integrators use quaternions and Romberg quadratures to accelerate both 
the computation and the convergence of spin rotations. We evaluate their 
performance and accuracy in quantitative detail for individual elements as well 
as for the entire RHIC lattice. We exploit the inherently data-parallel nature 
of spin tracking to accelerate our algorithms on graphics processing units. 


1. Introduction 

The origin of nucleon spin remains an enduring puzzle in nuclear physics [6], 
and elucidating this puzzle is the principal focus of polarized beam experiments 
at the Relativistic Heavy Ion Collider (RHIC) at Brookhaven National Labora¬ 
tory [32, 7, 37]. Because statistical uncertainties scale inversely with the square of 
the polarization [21], optimizing the polarization is essential for the efficient use of 
experimental resources. 

Computer simulations serve an important role in understanding and improving 
beam polarization. For example, the invariant spin field (ISF) places an important 
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upper bound on the maximum attainable polarization of a stored beam [34, 36], and 
hence a knowledge of the ISF and how it varies with the machine optics is essential 
to optimizing the beam polarization. 

Another motivation for spin-orbit tracking simulations—especially of high accura¬ 
cy—derives from proposals to use storage rings in searches for a permanent electric 
dipole moment (EDM) in protons and deuterons [50]. Assessing the sensitivity of 
such experiments will require long-term spin-orbit simulations of unprecedented 
accuracy [38]. Other projects that can benefit from similar studies include MEIC [4, 3], 
the LHeC [2, 14], and the muon g — 2 experiment [48, 55]. In addition, accurate 
spin-orbit tracking enables one to perform careful tests of mathematical concepts 
related to spin dynamics [9]. 

In spite of the manifest need for accurate simulations of spin dynamics, relatively 
few efforts have been made to develop codes that include spin tracking. In the 
context of hadron machines, which is the context of this paper, spin-orbit tracking 
codes for large storage rings and accelerators first emerged in the 1990s in response 
to the needs of specific projects. Meot added spin tracking to Zgoubi in 1992 [47, 46]. 
Hoffstatter and Berz did the same for COSY-Ineinity in 1993 [11]. Luccio developed 
SPINK in the mid-1990s to support the goal of accelerating polarized protons in 
RHIC [39, 41]. Hoffstatter and Vogt created SPRINT to study the feasibility of 
attaining proton polarization at very high energy in HERA [61, 36]. The code 
FORGET-ME-NOT, by Golubeva and Balandin, was developed for this purpose 
also [8]. In the mid-2000s spin motion was added to PTC by Forest [27, 30, 24] and 
to Bmad by Sagan [56]. Mane has recently added the code ELIMS to this list [43]. 

Some of these codes also include effects particular to electrons—especially syn¬ 
chrotron radiation. In addition, there are other more specialized codes that handle 
electrons: Of signal importance is the development of SLIM by Chao [16] in the 
early 1980s. 

The original version of SPINK used orbit transport matrices generated by 
MAD8—later by MAD-X—to compute the orbital motion. That version was used 
for extensive studies of the beam polarization in RHIC [39]. SPINK was later 
incorporated into the UAL framework and modified to use TEAPOT’s orbital 
integrators. The current work arose out of an effort to port UAL-SPINK to 
GPU platforms. In the course of that work, however, it was discovered that even 
when using TEAPOT’s orbital integrators, the code had difficulties with spin 
convergence [54, 1], especially in the neighborhood of a strong spin resonance. 
Addressing this issue meant slowing down what were already numerically-demanding 
spin tracking simulations. 

Here we present a very accurate and efficient spin-orbit tracking code, gpuSpin- 
Tragk, that we have incorporated into the UAL framework [42]. Because we have 
found that the accuracy of the orbital data has a significant impact on the accuracy 
of the spin tracking, our code relies on first performing very accurate symplectic 
integration for the orbital motion. With orbital data in hand, we then use the 
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Thomas-Bargman-Michel-Telegdi (Thomas-BMT, or T-BMT) equation [58, 10] to 
integrate the spin motion. Our integration methods include the effects of acceleration 
on both orbit and spin degrees of freedom. We note that because gpuSpinTrack 
tracks the full nonlinear orbital motion and the full 3D spin motion, it is sensitive 
to the full range of spin-orbit resonances. This is particularly important in the 
context of acceleration across spin-orbit resonances. The most significant aspect of 
this work is that we have found a means of accelerating the convergence of the spin 
integration. 

To obtain reliable results from numerical studies, one must have a quantitative 
understanding of the errors inherent in a given simulation. With a quantitative 
model for the errors, one can perform simulations of the desired accuracy without 
wasting computational effort motivated by overly pessimistic error estimates. Another 
significant contribution of this paper is a detailed analysis of the accuracy of our 
integrators. We study how the orbital and spin motion converge after one turn, 
and also the long-time stability of our algorithms. For the spin dynamics, we also 
examine the errors across individual elements. 

Numerical efficiency is another important consideration for spin-orbit tracking 
codes. To increase the speed of our simulations, we have implemented all our 
integrators on a graphics processing unit (GPU). The embarrassingly parallel nature 
of spin-orbit tracking (in the absence of space-charge) makes this type of computation 
an ideal fit to the highly parallel architecture of GPUs. 

In the next three sections, we describe the dynamical model used for our simula¬ 
tions (section 2), the orbital integrators we implement (section 3), and our approach 
to spin integration (section 4). The latter section describes what we refer to as 
Romberg quadratures for spin, our method for accelerating the convergence of spin 
integration. In section 5, we describe the performance of both orbit and spin integra¬ 
tors. We give a brief description of the GPU implementation in section 6. Finally, 
in section 7, we conclude by discussing our results. 

2. The Dynamical Model 

In this study, we ignore the effects of synchrotron radiation and space-charge forces. 
We therefore model the orbital dynamics in an accelerator using a single-particle 
Hamiltonian appropriate to the externally applied magnetic and electric fields of a 
particle accelerator. For the spin dynamics, we treat a particle’s spin expectation 
value as a unit three vector^ that obeys the Thomas-BMT equation [58, 10]. 

2.1. Model for orbital dynamics. To describe the orbital motion of particles 
in a beamline element, we use longitudinal distance s along the element as our 
independent variable, and we use the same canonical phase-space coordinates as 
MAD [20]: 

(1) Z={X,P„Y,Py,T,Pt). 

^Note that for particles with integer spin, there can be a state with vanishing spin expectation 
value. But this state is usually of little interest for experiments with polarized beams. 
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Here X and Y denote the horizontal and vertical coordinates in the local reference 


frame of our beamline element; Px and Py denote the corresponding conjugate 
momenta, divided by a fixed scale momentum po = m^oPoC; T = —cAt measures 
the flight time (times the speed of light, c) relative to a reference particle with 
momentum po; and Pt denotes the energy deviation scaled by poC, so that 


— myoC^ myc^ 1 


( 2 ) 


PoC PoC f3o ' 


Note that the minus sign in the definition of T means that a positive value for T 
implies that our particle arrives earlier than the reference particle. 

For magneto-static elements with strictly transverse fields that do not vary along 
the length, our Hamiltonian for a particle of charge q has the general form 



^{l + hX)A,{X,Y) + ^Pt 

Po Po 


(3) H 


where h = Ijpc denotes the curvature of the local coordinate frame— h will vanish 
except in sector bends—and As{X,Y) denotes the longitudinal component of the 
element’s vector potential. (We choose a gauge in which the transverse components 
of the vector potential vanish.) Note that this Hamiltonian assumes any bending 
occurs only in the horizontal plane; for bends in any other plane, we simply apply 
rotations before and after a horizontal bend. The last term, Pt/I^o^ accounts for 
the flight time of a particle with the reference momentum traversing the orbit 

x = y = 0. 

When integrating through a dipole, one must exercise some care concerning the 
relation between the fixed geometry, defined by a magnet’s curvature and placement, 
and the variable physics, defined by a magnet’s field strength. For sector bends, 
the fixed geometry means the curvature h of the local frame is fixed. In the usual 
practice, one then sets the scale momentum po according to this curvature, so that 
Po = qBipc^ where Bi denotes the magnet’s design field strength. Then qBi/po = 
and the vector potential term in (3) simplifies to ^(1 YhX)‘^, but only if the magnet 
is correctly powered. To include the possibility of mispowering errors, one must not 
thus confuse the geometry with the physics; one must instead retain the dependence 
of the Hamiltonian on the actual magnetic field strength. 

For rectangular bends, which have curvature = 0, a more fundamental difficulty 
arises from the fact that the orbital motion is most simply integrated using Cartesian 
coordinates, whereas the curved design orbit does not follow the straight magnet axis. 
This means the horizontal phase-space variables (X, P^,), which describe deviations 
from the magnet axis, cannot describe deviations from any choice of reference orbit. 
Moreover, for a given fixed bending angle, the path length—hence also the flight 
time—will depend on the entrance angle. As a consequence, the flight time along 
even a reference orbit will depend on how the rectangular bend is oriented relative 
to adjacent beamline elements. If we wish the phase-space variables of (1) at both 
entrance and exit of this magnet to represent deviations, then we must view the 
Hamiltonian (3) as written in a set of variables internal to this magnet (and we must 
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drop the term PtjPo). We then perform the appropriate conversions at entrance 
and exit. 

Of course real magnets, because they obey the dictates of Maxwell, do not have 
“strictly transverse fields that do not vary along the length”. That fiction — to which 
no sensible spectroscopist subscribes — proves useful in much of accelerator physics 
because (a) long magnets have fields for which that fiction is close to the truth, (b) 
short magnets yield orbital motion that is dominated by the integrated strength, 
and (c) fringe-field effects in multipole magnets, which arise primarily from the 
longitudinal field component, have contributions from both entrance and exit that 
tend to cancel, particularly in short magnets [23]. Still the field variation across a 
magnet fringe can have a significant impact on the beam dynamics. Cases where 
this is especially true include solenoids, whose longitudinal fields naturally have 
extended fringe regions; and rectangular bends, for which non-normal entrance and 
exit angles lead to vertical focusing [13]. In addition, nonlinear contributions in, for 
example, quadrupole fringe fields can cause large-emittance beams to occupy a large 
footprint in tune space [51]. 

In this paper we shall, for the most part, avoid detailed discussions of fringe 
fields. We make this choice for a couple of reasons, (i) The simulations shown in this 
paper were all done in the context of the Relativistic Heavy Ion Collider (RHIC) at 
Brookhaven National Lab, for which fringe field effects are relatively small, (ii) The 
subject of transfer maps for magnetic fringe fields is complicated, and very good 
discussions are available elsewhere. In addition to the work cited above, the reader 
may consult Forest and colleagues [25, 22, 28, 29] for work done mostly in the 
context of a hard-edge limit, or Dragt and colleagues [59, 60, 49, 18] for work on 
transfer maps across realistic magnets that include fringe fields. 

2.2. Model for spin dynamics. We describe a particle’s spin expectation value 
by a unit three vector S. The precession of this spin in a magnetic field B is governed 
by the Thomas-BMT equation [58, 10], which says that in the rest frame of the 
magnet—our laboratory frame—the spin precesses according to the rule 


— = 5 X [(1 + G7)B - G(7 - 1)U • . 

at my ^ 


(4) 


An additional term, not shown here, must be included in the presence of electric 
fields. In this equation, G denotes the gyro-magnetic anomaly, {g — 2)/2, with g the 
particle’s gyromagnetic ratio; and u denotes the unit velocity vector, obtained by 
normalising the momentum vector (Px^Py^Ps)^- Here Ps denotes the longitudinal 
component of the scaled momentum, given by 



(5) 


As in the case of orbital motion, it proves convenient to transform the equation 
of spin motion (4) to one with s as the independent variable. To do this, we 
multiply both sides of (4) by dt/ds = 1/5. It is important to note that in a curved 
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Frenet-Serret coordinate system, s ^ Vg. Rather, simple geometry tells us that 
i;, = (pc + X)e = (1 + hX)pce = (1 + hX)s. 

Using this result, we now obtain the modified T-BMT equation in the form 

A O 

(6a) — = U X 5, 

ds 

where 

(6b) Ct = [(1 + Gi)B - G '(7 - 1)(m ■ B)u] + hy, 

and {Bp)o = Po/q denotes the (signed) reference value of the magnetic rigidity. The 
last term, hp, accounts for the local frame rotation (assuming the rotation is about 
the axis y). 

Our model does not include spin-dependent contributions to the orbital motion, 
so-called Stern-Gerlach forces, because these are completely negligible for essentially 
all accelerators [33]. 


3. Orbit Integration 


To integrate a particle’s orbital motion through a given beamline element, we 
use the standard technique of writing the relevant Hamiltonian as a sum of one or 
several integrable pieces. One then obtains an approximate solution for the total 
Hamiltonian by an appropriate concatenation of exact solutions for the separate 
pieces. An important advantage of this approach of splitting the Hamiltonian is 
that it produces a symplectic integrator— i.e. an integrator that preserves the 
fundamental structure that underlies any Hamiltonian system. A second advantage 
is that by composing the partial solutions in a symmetric fashion, one can achieve 
second-order convergence to the exact solution. Indeed, more sophisticated symmetric 
compositions allow one to achieve even higher order convergence [62, 23]. For an 
extensive and valuable introduction to the literature on such splitting methods^ see 
McLachlan and Quispel [45]. 

There are several ways one might split the Hamiltonian (3). One of the oldest— 
still widely used—is the drift-kick split that is the basis of TEAPOT [57, 23]. One 
virtue of the drift-kick split is that it applies to a wide variety of beamline elements. 
If, however, we focus on particular elements, we can tailor the split accordingly. 
For ideal bending magnets—sector bends and rectangular bends—one can derive 
exact solutions, see sections 3.1 and 3.2. For quadrupoles, one can split off the 
linear transverse motion, leaving a much smaller nonlinear kick as a correction, 
see section 3.3. For higher-order multipoles, we return to the drift-kick split of 
TEAPOT. 

3.1. Sector bend. For a pure sector bend, one may write the vector potential term 
in (3) as 


(7) 


-^A,iX,Y) 

Po 


h 

2h{Bp)o 


{l + hXf, 
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where hi denotes the actual dipole field strength, and h denotes the (fixed) physical 
curvature 1/pc of the magnet. One can solve the corresponding equations of motion 
analytically [22]. First define the scaled momentum in the horizontal plane, 


(8) Pa = sjl+jPt + P^-Pl 

This quantity is a constant of the motion. Also define the dimensionless parameter 

(9) n = ^ 

^ ^ ^ h{Bp), PoA’ 

which measures the magnetic field strength relative to its design value. Then, setting 
5 = 0 at the magnet entrance, we obtain the sector bend trajectory in the form 


( 10 a) 

(lOb) 

( 10 c) 

(lOd) 

(lOe) 

(lOf) 


Px{s) = PI cos{hs) + - 7]{1 + hX^)] sm{hs), 


= T 

h \_r] 


Y{S) = V + -PyS + \Py 
T] ^ rjh ^ 

Py{s) = Pt, 


sm 


/ P 

-1 / 


Tis) 


Pt{s) = Pi, 


= A - -Pts 


7 ] 


■qh V/?o 
s 


+Pt 


sm 


sm 


-1 / 


1 (Px{s) 


• -l(Px{s) 

— sm ' 




where the superscript i denotes values at the entrance. 

Most simulations set the scale momentum po such that Po/q = Bi/h^ where 
Bi denotes the design value of the magnetic field strength. With this choice, the 
parameter p becomes 

h/h Bi + AB , , AB , , j 

where 6b denotes the relative magnet mispowering error. For this choice of scaling, 
and for a correctly powered magnet, i.e. 6 b =0 and 77 = 1 , an on-energy, on-axis 
particle will enter the magnet with ^= 0 , and will also exit the magnet with z = 0. 
In other words, this choice of scaling means that the map (10) with 77 = 1 preserves 
the origin. 


3.2. Rectangular bend. For an ideal rectangular bend, one may write the vector 
potential term in (3) as 

( 11 ) -^A,{X,Y) = j^X, 

Po \Bp)o 

where hi and {Bp)o have the same meanings as for the sector bend. One can again 
solve the corresponding equations of motion analytically [22]. Here we define the 


















D.T. ABELL, D. MEISER, V.H. RANJBAR, AND D.R BARBER 


length parameter 

( 12 ) 


Pb 


{Bp)o 

h ’ 


and we retain the definition of as in (8). Then, again setting s = 0 at the magnet 
entrance, 


(13a) 

(13b) 

(13c) 

(13d) 

(13e) 

(13f) 


Xis) = XA PblVP^ - L?(s) - - {Pi? 


p,(s) = K--, 

pb 

Y{s)=Y^ + PyPb 

Pyis) = Pi, 

1 


/ P’ 
• —1 / 

sm ' 


• —1 

— sm ' 


Pt{s)=Pl 


sm 


-lfPx\ 


— sm 


Remember that, per our discussion near the end of section 2.1, the phase-space 
coordinates here are internal to the rectangular bend. We require transformations 
that connect these coordinates to those used outside this magnet. See (16) and its 
solution (17) below. 

Since a beam usually enters and exits a rectangular bend at some non-zero angle 
(j) with respect to the magnet face, one must (a) perform dynamic rotations that 
transform the beam into and out of the Cartesian frame of the magnet, and (b) 
account for the vertical focusing effect. We describe the latter first. 

At the entrance, or exit, of a rectangular bend, a typical beam has a significant 
horizontal momentum Px with respect to a Cartesian frame aligned with the 
corresponding magnet face—approximately the sine of half the total bend angle of 
that magnet. Because the magnet fringe field includes a small horizontal component 
proportional (at lowest order) to the vertical displacement, a particle in that beam 
experiences a focusing kick (see (15d) below) 


AP^ oc -{±h)PxY, 

where b = ehijpo = hi/{Bp)o = l/p6 denotes the scaled dipole field strength, and 
the upper (lower) sign refers to the magnet entrance (exit). This kick, however, 
cannot represent the whole story, for this by itself cannot constitute a canonical 
transformation. 

Treating the magnetic field of a rectangular bend as mid-plane symmetric and 
as having no dependence on the horizontal coordinate X, one may write the corre¬ 
sponding vector potential in the form 

oo 2k 

^^(-l)'=i?(2fc-i)(^)|_, 0, -xB,{z) , 

where Bo{z) denotes the mid-plane magnetic field. In the thin-fringe limit, Bo{z) 
becomes a step function, and the B^P {z) become a delta function and multiple 


(14) A = 
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derivatives thereof. From here there remains a long journey to reach the goal of 
a symplectic transfer map that describes orbital motion across a thin fringe. We 
simply quote the result. In the absence of a finite-gap correction, one may obtain [22] 


(15a) 

xf = X^ + 

(15b) 

II 

(15c) 

y/ = 2V/ 

(15d) 

1 

II 

(15e) 

1 

II 

(15f) 

pI = Pi- 


±b + 2(tV)" 

2 P , \ 1 + { y ')2 [1 + 


{Yfy 


lib 


x'y' 


1 i {y’f 


2F» 


A 


1 - {y') 


/^2 


2P2 [1 + (y/)2]: 




Here, x' and y' respectively denote Px/Ps aiid PyjPg^ and Pq remains as given in 
(5). For the result including a finite-gap correction, consult [28, 29]. 

The dynamic rotations that transform the beam into and out of the Cartesian 
frame of the magnet amount to drifts in cylindrical coordinates. Because these 
transformations rotate the spin, they are important; on the other hand, they play no 
role in the accuracy of our spin integration. Nevertheless, for the sake of completeness, 
we include them here: The relevant Hamiltonian, 


(16) 


H^rot = -XWl + -P* + P2 _ p2 _ p2 = _XP^, 


yields the solution 

(17a) 

(17b) 

(17c) 

(17d) 

(17e) 

(17f) 


X(rh^ = ^ 

l-(PVPi)tan0’ 

A(<)>) = K cos (f) + p* sin (j), 


f,(« = p;. 
= A + 

Pt{ 4 >) = Pi- 


(-U/-U)tan^’ 


l//3o+Pt XHan^ 


1 - (U/A)tan(/)’ 


One needs to apply this transformation at the entrance with angle and at the 

exit with angle —In addition, if the design orbit is not symmetric across the 
magnet, i.e. if 0®^'* ^ — 0®"*'', one must correct for the horizontal offset 

cos (/)®“0' - cos 


(18) 


X 


offset 


= -L 


Aexit 
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Lastly, if the temporal variable represents a deviation, then one must subtract from 
T the amount 


La 


L 


(19) T = — — 

^ ^ "" /3o Po sin - sin 

corresponding to the time for the reference particle to cross this magnet. 


3.3. Quadrupole. For an ideal quadrupole, we write the vector potential term in 
(3) as 

(20) -^MX,Y) = -^iX^-Y^), 

Po 2{Bp)o 

where 62 denotes the quadrupole gradient. The most accurate known split of the 
resulting Hamiltonian separates out the linear transverse motion, retaining the exact 
dependence on the energy deviation [ 22 ]; thus. 


( 21 a) 


Bq — Hqi^ T -HgNL? 


where 

(21b) 

and 

(21c) 


u _ ^ ^ X ~ ^ y 

J^qL — - 


^2 




gNL 


2 yi + ^Pt+p2 2(pp), 


1 + ^P*+P2-P|-P|-1 




p2 ^ p2 




1 + 


J-P + P? 


This is the splitting we use for the quadrupole Hamiltonian. 

An alternative to the above split [26] does not include the correction for the 
energy deviation—the square root in (21b). While that split fails to produce the 
correct tune for off-energy particles, it does have the virtue of speed, because the 
same transfer matrix applies to all particles. 

To solve the Hamiltonians in (21), we first define the scaled total momentum 


(22) P = ^1+^P + P2, 

which is a constant of the motion in the quadrupole, and also the (energy-dependent) 
focusing strength 

^ \l{Bp)oP' 

Then, setting 5 = 0 at the magnet entrance, one obtains the solution for the linear 
transverse motion, exact in the energy deviation, as 

{24a) .Ym(a)=A-‘coa(a,) + a522M, 

P KS 

(24b) P2^\s) = Pi cos(ks) — hzPX^ sm{Ks), 

(24c) y (s) = y* cosh(Ks) + s L ^ 

P KS 

(24d) Py) (s) = P; cosh(Ks) + nPY^ sinh(Ks), 
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T(L)(s)=r-i(^^ + Pi 


(24e) 


(24f) pj^\s) = Pi. 


s 

+ 4 


+ 4 


KS 

T 


f] 


(pip 4- (pip 

^ +^^((xT-(yT) 

sm{2Ks) 


i\2 


i\2 


^ +(«n 


2ns 

sinh( 2 /^s) 

2ns 


-i Px sin2(Ks) Py sinP(Ks) 


^ X —- _ yri y ' 

P ns 


ns 


This solution assumes a positive value for the quadrupole gradient 62 , so that this 
magnet focuses in the X-Px plane and defocuses in the Y-Py plane. We can always 
convert a skew quadrupole, or a more general field orientation, to this case by 
applying rotations in the transverse plane— i. e. about the longitudinal axis—before 
and after the element. 

The nonlinear part of the quadrupole Hamiltonian, i^^NL, represents a “kick” of 
the position coordinates. A straightforward integration yields the solution 

(25a) X(nl)( 5 ) 

(25b) PiNL)(5) 

(25c) y(’^^)(s) 

(25d) PfL)(s) 

(25e) 

(25f) P}^^\s) 

where 

( 26 ) Pi = vi+2Pt/po+p^-{Pi)^-{Pif. 


= P* 

= y* + sP* ( — _— 

^ +^^y[pi p 

= Pl 


vlu P 2P2 ]+^o’ 


= A 


In addition to producing the correct tunes for off-energy particles, the Hamiltonian 
split shown in ( 21 ) has the additional advantage that the term P^nl is small in the 
paraxial approximation. In particular, it is fourth-order in the transverse momenta: 
PgNL ^ ^{P^Py) with m + n = 4. 

We employ the linear and nonlinear solutions shown here by concatenating 
the corresponding maps in a time-reversal-symmetric manner. Using the simplest 
symmetric approximation. 




(27) 


gS/2 /2 :Hqi^: 
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we obtain results that are accurate through to second order in the step size s. (Here 
we have used the colon notation introduced by Dragt [17, 19].) If desired, one may 
use higher-order versions of (27) [62, 45, 23]. 

For information about transfer maps for quadrupole fringe fields, consult [25]. 

3.4. Higher-order multipole. For an ideal straight mnltipole magnet—sextnpole, 
octupole, etc .—one may write the vector potential term in (3) as 

(28) - ^ {X, Y) = Re (x + iR)™, 

Po m[Bp)o 

where the bm and respectively denote the normal and skew multipole coefficient 
for a magnet with 2m poles. We split the resulting Hamiltonian into two pieces: 


(29a) 




mult 


— -f^drift “1“ -^77 


where 


(29b) 


Hiriit = -yi + + A - P| - P2 + Ip* 

= —Ps + 

Po 


and Hm denotes the multipole term in (28), or possibly a superposition of such 
terms. This constitutes the well-known drift-kick split. Because i^drift depends only 
on momenta, and Hm only on coordinates, they each generate very simple motion. 
We then solve the total Hamiltonian, ia a manner exactly analogous to our 

solution in (27) of the quadrupole Hamiltonian (21). 

For information about transfer maps for multipole fringe fields, consult [25]. 


3.5. Thin lens. A thin lens is a representation of a (short) multipole magnet 
wherein we imagine the length shrinks to zero and the strengths bm and grow to 
infinity in such a manner that their product remains constant. Except for taking 
this limit, we treat the thin lens just as we do the multipole of section 3.4. Since 
the drift vanishes, one may think of this as “a drift-kick split without the drift”. 


3.6. TEAPOT. The orbital integration performed in TEAPOT [57] is essentially 
identical in spirit to the integration of multipoles described above in section 3.4: 
One splits the general Hamiltonian (3) into a drift (in Frenet-Serret coordinates for 
a sector bend) and a momentum kick due to the magnetic field. Thus 

(30a) Ht = H]j ^ Hx^ 

where, in general, 

(30b) jf^ = -(i + /,x)y]AAi^+fTrF^iTF^, 

and 

(30c) HK = - — {lYhX)A,{X,Y). 

Po 
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When h vanishes, one obtains exactly the mnltipole drift-kick split of section 3.4. In 
the case of non-zero curvature, i.e. h ^ 0, the vector potential ^4^ becomes quite 
complicated. Nevertheless, because Hk depends only on coordinates, one may easily 
compute the motion generated. 

The drift Hamiltonian Hjj differs only slightly from that in (16), and we obtain 
the solution from (17) by replacing X with 1/h + X, and 0 with hs\ 


(31a) 

X(s) 

(31b) 

A(s) 

(31c) 

r(s) 

(31d) 

Pyis) 

(31e) 

Tis) 

(31f) 

PM) 


(1/h + X*) sec(hs) 1 

= l-(PJ/Pi)tan(M 

= PI cos{hs) + PI sin(hs). 


^ ^ ( 1//1 +V)tan(/i5) 

A 1 - (U/A)tan(M 

= PL 


_ rpl 


PI 


l/Po + Pt (l/h + X')tan(h5) 


1 - (Pi/PD^Mhs)' 


As in (27), we then combine the two solutions—the one above for Hd-, and the one 
for Hk —to approximate the motion generated by the full Hamiltonian Ht- 

A disadvantage of this integration scheme is that the two pieces of the Hamiltonian 
Ht in (30a) may be of similar order. While the resulting integrator is still second- 
order accurate in 5, it has a larger constant factor in the error term. This yields, 
overall, a less accurate integration for a given number of slices. This proves to be a 
significant issue when we go on to integrate the spin motion. 

On the other hand, drift-kick integration has the advantage that it applies to 
any magnetic field described by a longitudinal vector potential As{X,Y). Moreover, 
it does not suffer from the apparent discontinuity in the map that occurs for the 
quadrupole when using the matrix-kick integrator of section 3.3. Avoiding the use of 
control-flow statements [if (f ocusingQuad) {. . .} else {...}] can be an important 
consideration when using an integrator for map computation rather than particle 
tracking [23]. 


3.7. Solenoid. For the solenoid, we use a model which has a uniform longitudinal 
field in the body and the flux return confined to a thin-pancake fringe region. The 
longitudinal body field necessitates a transverse vector potential (6o/2)p0, which 
we write in the Cartesian form 

(32) A, = -^boY, Ay = ^boX, 

where bo denotes the magnetic field strength. 

For our thin-pancake fringe model, we multiply the components (32) by a profile 
function Ss{z) that rises from zero to one across an entrance fringe of length 5, 
remains constant across the body, and then returns to zero across an exit fringe of 
length 5. We eventually take the limit 5^0. This model means our pancake fringe 
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has a longitudinal field that rises, or falls, linearly across the fringe, and a radial 
field with fiux boirp^ crossing an area 27rps at radius p. 

Because we have a transverse vector potential, we must modify the general 
Hamiltonian (3): Since we use a Cartesian reference frame, we set = 0; of course 
we set = 0; and the canonical momentum now equals kinetic momentum plus 
vector potential. We therefore write the Hamiltonian for a magnetic solenoid in the 
form 


(33) 


Hso\ — 


= -P. 


Ap 

A 





On crossing one of the solenoid fringes, a particle experiences a transverse kick 
from the radial component of the magnetic field. The transverse vector potential 
also changes across the same fringe. It turns out that for our thin-pancake fringe 
model, those changes—to the kinetic momentum and the vector potential—cancel 
in a manner that leaves the transverse canonical momentum unchanged across the 
fringe. Moreover, in the small-5 limit, the transverse coordinates do not change 
across a fringe. Finally, we ignore the effect on the longitudinal momentum, because 
it is second-order in the transverse dynamical variables. The net result is that for 
our thin-pancake fringe, the Hamiltonian (33) with vector potential (32) describes 
the entire solenoid. 

Given the above model, one can compute an exact analytic result for the motion 
generated by the Hamiltonian (33). The solution comprises the product of a pair of 
commuting matrices that act on the transverse variables (X, P^,, T, P^), plus a shift 
of the temporal variable T : 


(34a) 


(34b) 



cos{ks) 

b sin(Ks) 

0 

0 

—psm{hzs) 

COs{hZs) 

0 

0 

0 

0 

cos{hzs) 

^ sin(/^5) 

0 

0 

—p sm{Ks) 

cos{ks) 

cos{ks) 

0 

sm{his) 

0 

0 

cos{ks) 

0 

sm{Ks) 

— sm{his) 

0 

cos{his) 

0 

0 

— sm{Ks) 

0 

cos{his) 


(34c) 


T{s)=T-(- + pA- 


(/3o 


In the above expressions, the parameter 

ebo 


(35) 


V = 


2po 2{Bp)c 


s 

Po' 


PI denotes the initial value of the square-root in the Hamiltonian (33), and n = pj PI. 
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The reader should note that the presence of P] in (34) means that—despite su¬ 
perficial appearances—the transformation represented by (34) is actually nonlinear. 

Later, when computing the spin precession vector we shall need to compute 
the direction vector u. All magnets discussed previously can be described using a 
longitudinal vector potential. Then, as mentioned in section 2.2, we can obtain u 
simply by normalising the momentum vector (P^,, Py^ Ps)^• For the solenoid, however, 
we must be sure to subtract the transverse components of the vector potential from 
the transverse canonical momenta to obtain the (scaled) kinetic momentum vector 

(36) rnyV = (p^ - —A^, Py - —Ay, P^) . 

V Po Po / 

From this we compute u = mjV/\\mjV\\. 

4. Spin Integration 

High-quality orbital integration is a prerequisite for accurate spin integration, 
because it yields accurate values for the precession vector fl. However, discretizing 
the spin motion leads to a separate source of error that can cause inaccurate spin 
precession, even in the context of perfect orbital integration. To address this difficulty, 
one might, for example, choose to adjust the number of spin slices independently 
of the number of orbital slices. In this section, we discuss this issue along with a 
description of our methods for tracking spin. 

4.1. Piece-wise constant (PWC) spin precession. A commonly used integra¬ 
tion strategy for spin treats the magnetic fields and velocity vectors in (fib) as 
constant throughout a slice of length As. One then integrates (fi), the Thomas-BMT 
equation, to the form 

(37a) S{s -h As) = R(cj) • 5(s), 

where R(cl;) denotes the 3x3 matrix that describes rotation about axis io by angle 
|u;|. We approximate this rotation vector as cj = As Cl, with Cl given in (fib). Then 
to compute the spin rotation across a whole element, one simply multiplies the 
contributions from each slice. For four slices, say, one thus transports an initial spin 
to a final spin according to 

(37b) = R(w4) ■ R(w3) ■ R(w 2) ■ R(wi) ■ S\ 

There are two sources of errors in this approach to spin integration. First, errors 
in the orbital integration feed into the spin integration via errors in the fields and 
velocities needed in the right-hand side of the Thomas-BMT equation. Second, 
treating the rotation axis as piece-wise constant introduces errors that arise from 
the non-commutativity of spin rotations around non-parallel axes. 

When using drift-kick integrators for the orbital motion, the first source of error 
usually dominates. Then increasing the number of orbital slices to improve the 
orbital accuracy automatically diminishes the magnitude of the second type of error. 
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OJiis) 



Figure 1. Schematic of interpolating a function using a cubic Bezier curve. 
The function to be interpolated is shown as a thin solid curve, and the 
constructed interpolating Bezier curve is shown by the heavy dashed curve. 
The large dots indicate the control points derived from the value and slope of 
the function at Si and 52- 

We thus find that with drift-kick integrators, increasing the number of orbital slices 
is essential for accurate spin integration. 

The situation changes when we use the more accurate bend-kick and matrix-kick 
integrators described above for dipoles and quadrupoles. These integrators allow 
us to take such large steps through both dipoles and quadrupoles that the lack of 
commutativity between consecutive spin rotations can now become an issue. 

To speed the accumulation of spin rotations across a set of orbital slices, we use 
quaternions to represent rotations [12]. Compared to matrix multiplication, this 
saves a factor of about two in the required arithmetic operations. In addition, the 
spin rotation angles are very often sufficiently small that one may approximate 
the needed trigonometric functions by low-order polynomials. Every so often, we 
explicitly normalize the resulting quaternions to ensure that they remain unit 
quaternions.^ 

4.2. Bezier interpolation. To treat the second source of error — arising from the 
fact that non-parallel rotations do not commute—we could do the same as for the first 
source of error: increase the number of orbital slices. That approach, however, seems 
unnecessarily expensive when we use very accurate, or exact, orbital integrators. 

It turns out that the rotation vector varies across a slice in a sufficiently simple 
manner that we may interpolate it across a slice using piece-wise cubic polynomials, 
constructed as Bezier curves [15]. To use this technique, we compute, at the edge 
of each slice during the orbital integration, not only fl, but also the derivatives of 
its three components.^ With that information, we construct a cubic interpolating 
polynomial with the correct slope at the end points; see figure I for a schematic 
illustration. For the variations that occur in our simulations, this approach yields 

^To simplify the bookkeeping, we do this at the end of each element, but once per turn would 
likely suffice. 

^Computing the longitudinal derivative of Q is tedious, but straightforward: Express the 
magnetic field in terms of particle location—so that Q depends solely on the orbital phase-space 
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very accurate interpolating functions. Moreover, the low cost of evaluating this 
interpolating polynomial makes it possible to take many spin steps per orbital slice. 

The PWC computation of the spin precession in (37b) will converge with a 
sufficiently large number of steps. If convergence requires an excessive number of 
steps, the use of cubic Bezier curves can speed up the computation. What the Bezier 
approach does not do is increase the rate at which the computation converges. For 
that we must look elsewhere, and that is the topic of the next section. 

4.3. Romberg quadratures for spin precession. As we have seen, the spin 
precession across a beamline element is not, in reality, piece-wise constant; rather, 
within an element it varies smoothly with s. More significantly, the integrators 
based on (37) exhibit second-order convergence (see section 5.2). This quadratic 
convergence suggests the use of some accelerating technique to cancel the errors. In 
particular, we have applied a Romberg approach [5, 53] to spin integration, and, as 
we show later in this paper, the improvement is dramatic. 

Instead of computing the rotation vectors at the middle of each slice, we now, for 
this new approach, compute them at the edges of each slice. We then accumulate 
the spin precession as in (37). For the first and last edges, however, we use a half¬ 
step; i.e. we replace As by As/2. This is akin to using the trapezoidal rule for 
integration [5, 35]. We do this using N slices, with N a multiple of some power of 
two. 

During the orbital integration, we record the value of (l at the edge of each 
slice. Then by keeping every other O, or every fourth, or every eighth, etc., we 
can approximate the net spin precession using a range of additional step-sizes that 
are related to the original by powers of two. In other words, we compute for these 
different step sizes a net quaternion that represents the spin precession of a given 
particle across an element. We obtain thus the sequence of quaternions 

(38) Q(/i), Q{h/2), Q{h/2^), Q{h/2^), ..., 

where h denotes the coarsest step size. 

For example, when crossing an element using eight slices, we will compute a total 
of nine Qs: Oq at the beginning of the first slice, 0i at the end of the first slice, on 
through Qg at the end of the last slice. Now, assuming each slice has length As, we 
multiply the first and last of these by As/2, and all the others by As, to obtain the 
sequence 

-Asf^o? Asf^i, As^l2: •••5 -AsQg. 

After constructing the quaternions that correspond to each of these nine precession 
vectors, we multiply them together to obtain a net quaternion that represents one 
approximation to the exact spin precession across this element. This is one of the 

variables. Then use the chain rule to compute 

dn _ dn dx dn dPx 

The quantities dgX^ dgPx, • • •, can be obtained from the orbital equations of motion. 
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entries in the sequence (38) above. To obtain the entry to its immediate left, we 
drop every other O and compensate by doubling the step size; the sequence 

2Asfl25 •••5 Asfig? 


thus yields another—the next coarser—approximation to the spin precession across 
this element. And so on. 

We then compute the Romberg limit: First define by 
(39a) Qok = Q{h/2’^) 


the sequence of approximations in (38). Then use the rule 


(39b) 


^ - Qj,k-i 


to construct the Romberg table: 


(39c) 


Qoo 



Qoi 

Qii 


Qo2 

Qi2 

Q 22 

Qo3 

Qi3 

Q 23 Q 33 


This table may have more or fewer rows than indicated here, but the number at the 
bottom right is the Romberg limit of the initial data given in the first column. We 
normalise the resulting quaternion at the end of each element. 

When integrating a well-behaved function over a finite interval, the trapezoidal 
rule together with the Romberg limit performs remarkably well with modest compu¬ 
tational effort. Its efficiency, however, derives from the structure of the error term 
seen in the Euler-Maclaurin summation formula [35, 53], and the manner in which 
the Romberg algorithm cancels those errors. Here, on the other hand, we have a 
product of quaternions, and hence no a priori reason to suspect that the above will 
actually work. We tried it for a lark. 


4.4. Solenoid fringe. In the thin-pancake model discussed earlier for the solenoid 
fringe, there exists a well-defined limit of the fringe length 5 times the precession 
vector fl, as 5 ^ 0. In that limit, a spin crossing a solenoid fringe experiences a net 
rotation given by the vector 


(40) 


CJ = ± 


bo/2 

iBp)oPs 


X P V P 

(1 + G7)(V L 0) - G(7 - 1) 


where we choose the leading plus or minus for entrance or exit, respectively. Here 
the kinetic momentum vector m^V is as given in (36), except that we must insert a 
factor of I in front of and In a similar manner, Pg denotes the square-root 
in (33), but with the same factor of | inserted before Ax and Ay. These factors of 
^ enter because in going to the limit of a zero-length fringe, we evaluate the field 
strength in the middle of that fringe region. 
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4.5. Dipole and multipole fringes. A particle’s spin also experiences a kick when 
crossing the fringe of a multipole magnet. This spin kick arises predominantly from 
the longitudinal field component in the fringe. In the limit of a thin fringe, and 
to lowest order in the dynamical variables, the spin kick caused by the fringe of 
a normal (z.e. not skew) 2m-pole magnet of strength bm is given by the rotation 
vector 

±b 

(41) iOm = - /p 7 — sin(m0) z, 

{Bp)o m 


where we convert sin(m(/)) to a function of X and Y using {X Y'lY)'^ = 

For a skew multipole, in (41) replace sin by cos, and bm by a^. 

The above result (41) is appropriate for beams that cross a fringe roughly 
normal to the entrance or exit face of the magnet. When entering, or exiting, a 
rectangular bend, however, a particle typically sees the longitudinal field Bz as 
having a significant component transverse to its velocity. As long as Px is not too 
large, one may account for this effect by simply adding an extra term to (41) to 
obtain the rotation vector 


(42) 


<^rb — (^1 + 


Ybi 

{ Bp)o 


G(7 - l)P:, X, 


which describes the spin kick experienced by a particle crossing the fringe of a 
rectangular bend. Note the factor of Gy in the second term: it is one good reason for 
not using rectangular bends in machines designed for polarized beams, particularly 
at high energy. 

If one requires a higher order computation of the spin kick across a magnet fringe, 
it is possible to apply a procedure analogous to that used for orbital kicks across 
fringe fields in [22]. 


5. Performance of integrators 

In this section, we examine the performance of the orbital and spin integrators 
described in the previous two sections. Our principal focus is on the integrators 
for spin. We examine the orbital integrators — much more briefly — with the goal of 
understanding their impact on spin integration. 

For the orbital integrators, we examine their performance in a particular context: 
that of Brookhaven’s Relativistic Heavy Ion Collider (RHIC) [32] operating at 
approximately 250 GeV, with the optics settings used before the beta squeeze. For 
the spin integrators, we examine their single-element performance, as well as their 
performance in the context of spin tracking for RHIC. In this context, we note that 
RHIC has two full Siberian snakes on opposite sides of the ring to flip the spins, 
with the snake angles set to ±45°. These settings mean that for a perfectly aligned 
RHIC, the design orbit will have a spin-tune of exactly | [7, 37]. Sextupoles are 
adjusted to set the chromaticities to { ixiiy ) = (0.70,0.74), and the spin rotators 
near the interaction points are turned off. The latter are needed only after the 
beta squeeze, when the counter-rotating RHIC beams are brought to collision, and 
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Figure 2. Poincare sections for four sample particles using drift-kick integra¬ 
tion with (a) = 4 and (b) = 16, and using BK/MK integration with (c) 

= 1 and (d) = 16. 

the experiments require longitudinal polarization at the interaction point. For our 
simulations, then, the equilibrium polarization is roughly vertical throughout the 
ring. 

5.1. Orbit integrators. For the simulations discussed here, we read a RHIC 
lattice description from a file in SXF format [31]. Which integrator to use for which 
elements is set in a separate kapdf’ file (for Accelerator Propagator Description 
Format). When referring to the integrators, we use the acronym BK/MK, ^ or 
sometimes just “bend-kick”, to mean the use of (10) for sector bends, together with 
(24) and (25) for quadrupoles. The Siberian snakes were modelled as thin elements: 
they are transparent to the orbital motion, and they act on the spin according to 
snake angles defined in an input file. In the simulations described here, we use a base 
number of slices for all dipoles and most quadrupoles. Then for the strongest 
quadrupoles, those in the interaction region, we use slices. 


‘^Bend-Kick/Matrix-Kick 
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Figure 3. Spectrum of X obtained using (a) drift-kick integration with = 8, 
or (b) BK/MK integration with = 1. The indicated tunes are all taken 
modulo 1. 


5.1.1. Poincare sections. To give a sense of how well the drift-kick and BK/MK 
integrators reproduce gross features of the orbital motion, we show Poincare sections 
of the X-Px plane; see figure 2. The range of amplitudes shown is representative of 
beams at this azimuth in RHIC—the interaction point, before the beta squeeze is 
applied to the optics. 

Figure 2 indicates that the drift-kick and BK/MK integrators produce the same 
gross features of the orbital motion. There are, however, differences in the details. In 
particular, the drift-kick traces with = 4 seem more elliptical, while the remaining 
traces seem more ovoid. The trace-widths also seem more regular for drift-kick 
integration with n^ = 4, as compared to all the others. Differences between the two 
BK/MK integrators, with = 1 and = 16, are not visible with the bare eye. 
Differences between (b), (c), and (d) are much smaller than those between (a), on 
the one hand, and (c) and (d), on the other. 


5.1.2. Orbital spectra and tune fitting. In figure 3 we show the spectrum of X 
obtained by taking a discrete Fourier transform of the X coordinates obtained over 
the course of T = 1000 turns. 


(43) 


Sx{i^) = 


1 ^ 

t=o 


— i27TtU 


X{t) 


The spectra were averaged over a bunch containing N = 1024 particles with 
amplitudes typical for RHIC. The tunes marked in the figure (see top edge) are 
the design values for RHIC at this energy (215.735GeV kinetic): Ux = 28.6835, 
Uy = 29.6742. Note that we see peaks not only at Ux and but also at 2^^, and 
2ny —evidence of significant nonlinearity in the orbital motion. 
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Figure 4. One-turn errors in X versus for drift-kick integration (upper 
curve), and BK/MK integration (lower curve). The errors were computed by 
comparing with a data set produced using BK/MK integration and = 256. 


As shown in figure 3, one can obtain a close match to the design orbital tunes 
using either drift-kick or BK/MK integration. However, in the case of drift-kick 
integration, one must adjust the strengths of the main quadrupoles by an amount 
that depends on the number of orbital slices used. No such adjustment is required 
when using BK/MK integration. 

In the context of purely orbital tracking, adjusting the quadrupole strengths to 
fit the desired tunes is a sensible means of ensuring the correct linear behavior in 
accelerator simulations. In the context of spin tracking, however, such adjustments 
might be problematic, because changing a quadrupole’s strength changes the amount 
of spin precession a particle experiences as it crosses that element. In other words, 
fitting the orbital tune can perturb the integrated spin precession angle of a given 
particle—particularly at high energy, z.e., at large values of the naive spin tune, Gy. 

We can estimate the importance of this effect as follows. On crossing a quadrupole, 
a spin experiences a net precession with magnitude given approximately by cj = — (1 + 
Gj)Ab 2 L/{Bp)o, where A denotes the particle amplitude, and L the quadrupole 
length. For the RHIC lattices we studied, the tune fitting required a relative change 
in 62 that varied greatly with the step-size used for crossing the quadrupoles. For 
finely sliced lattices, the relative change in 62 (from the value used for BK/MK 
integration) could be as small as a few times 10 “^. At RHIC energies, this adjustment 
yields a sub-prad change in the spin precession, and perhaps a few prads over a 
full turn. As we shall see later, when one uses drift-kick orbital integration, this 
variation is negligible compared to other sources of error. On the other hand, for 
very coarse slicing, the relative change in 62 can exceed 0.1 %, which leads to a small, 
though noticeable, change in the simulated spin precession angle. 
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Figure 5. One-turn errors in X versus amplitude for several different integra¬ 
tors: drift-kick with ni) = 32 (upper curve), and BK/MK with = 1 (lower 
curve). 


5.1.3. Single-turn errors. For accurate spin tracking, it does not suffice to integrate 
the orbital motion in a manner that merely reproduces global properties of the phase- 
space distribution. Errors in particle orbits introduce errors in the spin rotations, 
possibly changing the character of the simulated spin dynamics. As a consequence, 
one must compute accurately the trajectories of individual particles. 

To gain an understanding of the orbital errors on a per-particle basis, we look 
at the rms deviation between computed particle orbits and a reference solution for 
each particle. For the X coordinate after t turns, we thus compute 


(44) AX{t;n,) = 

The sum is over the particles in the bunch. The reference solution is computed with 
ni) = 128 using the BK/MK integrators, i.e. = Xj{t; 128). 

Figure 4 shows the absolute error AX(1; n^) after one turn for a RHIC beam at 
approximately 216 GeV with a 95% horizontal emittance Sx of 157rmmmrad. The 
slope —2 on the log-log scale shows that both drift-kick and BK/MK integrators 
exhibit second-order convergence in s. The drift-kick integrator starts out with a 
relative error of order one (c/. figure 2), indicating that with this very large step-size, 
the phases of the orbital motion are completely off: the errors are as large as they 
can be with a symplectic integrator. At = 64, the relative errors have decreased 
to about a part in 10^. On the other hand, the BK/MK integrators exhibit relative 
errors which start out below a part in 10^ and decrease until they are below about 
a part in 10^. 

Figure 5 also shows one-turn errors in X, but here as a function of particle 
emittance. These results, which are consistent with those of figure 4, tell us that the 
accuracies of the different integrators have roughly the same amplitude dependence. 
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Figure 6. Evolution of the orbital error in X as a function of turn number t. 
The four curves correspond to different methods of integration: from top to 
bottom they are drift-kick with = 8 and no tune fit (tan), drift-kick after 
tune fitting with = 4 (green), and BK/MK with = 1 (purple), and 4 
(maroon). For the reference solution, we used BK/MK with = 16. 

To reduce the noise level in this graphic, we have averaged the results within non¬ 
overlapping bins that each contained 20 particles in a narrow slice of emittance. The 
bands indicate one standard deviation above and below the computed average. 

5.1.4. Evolution of orbital errors over many turns. Figure 6 shows the errors in the 
X coordinate as a function of the number of turns for drift-kick integration and 
BK/MK integration. If we do not adjust the quadrupole strengths so as to fit the 
desired tunes, then the drift-kick integration errors oscillate periodically because of 
the tune errors. We see that behavior in the uppermost (tan) curve, for drift-kick 
integration with = S. After fitting the tunes, we note that drift-kick integration 
errors with = 4 now track the minima of the tan oscillatory errors. 

While the errors observed are much smaller for BK/MK integration—even with 
ni) = 1—they always exhibit the same qualitative behavior: After a few hundred 
turns, even when using BK/MK integration, the errors grow approximately linearly 
because of unavoidable tiny differences in the tunes. We believe this is not a significant 
issue. 

5.2. Spin integrators. Here we describe the accuracy of our spin integrators. First, 
we discuss the errors seen when tracking across various individual beamline elements, 
where we focus on the impact of both step-size and Romberg iterations. We then 
discuss, more briefly, the spin integration errors seen over the course of a single turn 
and many turns. 

5.2.1. Single-element errors. The graphics in figure 7 show the spin integration 
errors we computed when tracking a particular particle^ across several different 
beamline elements. From top to bottom, those elements are a quadrupole, a sector 

^While not essential to our discussion, we note that this particle had the initial phase-space values 
X'^ = 1.0 mm, = O.lmrad, Py = —0.2mrad, and P^ corresponding to a relative momentum 
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Figure 7. Absolute error in the computed spin versus the number of slices for 
a proton crossing several different beamline elements. All but the lower right 
graphic show results for a 200 GeV crossing a RHIC IR quad (top left), a 
RHIC arc dipole (top right), and a 2.1m, 1.3T solenoid (bottom left). The 
bottom-right graphic shows the result for a 25 GeV traversing the same 
solenoid. Each curve corresponds to a different method of integrating the spin: 
PWC for piece-wise constant integration; the k values indicate the number of 
Romberg iterations. 

bend, a solenoid, and another solenoid (but with a very low energy proton, such 
that the total spin rotation is comparable to 90°). In each of those graphics, the 
solid curve labelled PWC corresponds to piece-wise constant spin integration, where 
for the orbital motion we used the integrators given in sections 3.1, 3.3, and 3.7. In 
all cases the slope —2 reveals second-order convergence for PWC spin integration. 
The remaining curves correspond to k Romberg iterations {i.e. Qkk) applied to the 
PWC data obtained for the given number of slices. Since we do not know the result 
of exact spin integration, we have here estimated the spin integration error as the 
absolute difference between the result obtained using N slices and what we 
considered our “best” result, 5best- 

In figure 7, we have used for iSbest the results we obtained using 256 slices and 
three Romberg iterations. If we instead use for iSbest our most finely sliced PWC 
result, then small details in these graphics change, but the overall implication 
holds—that one or more iterations of the Romberg procedure can dramatically 
reduce the errors made by PWC spin integration. In the case of the quadrupole 


deviation (5 = 10 We set both y* and T* to zero. In addition, we set this particle’s initial spin 
toS^ (0.1952,0.9759,0.0976) 
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Figure 8. Absolute error in the computed spin across a RHIC dipole for 
a range of initial conditions in orbital phase space. Darker (lighter) grays 
represent smaller (larger) errors. Points outside the range indicated above each 
graphic are colored red if below, and blue if above. The error range covers 
the same five decades in the four graphics to the left. In the right-hand two 
graphics, the error range is again the same, but covers just a decade and a 
half that is mostly above the range in the other graphics. The upper and lower 
rows respectively show results obtained using PWC spin integration with 
four and eight slices. The three columns correspond to different values for Py 
and P/: Py = 0, P/ = 0 (left); Py = 20prad, P/ = 0 (middle); Py = 20prad, 
Pi ^ 0.35 X 10-3 (right). 


integrated using four slices, we see that two Romberg iterations yield a four-decade 
reduction in the error—to a level that requires some four-hundred slices using just 
PWC spin integration. For the solenoid, a similar statement holds in the case of 
a low-energy proton; and in the high-energy case, one Romberg iteration suffices 
to reduce the spin integration error to the level of round-off. For the sector bend, 
we see a less dramatic absolute reduction in the error; but even there, between 8 
and 64 slices, the slope —4 on the k = 1 curve tells us that one Romberg iteration 
converts second-order PWC integration to fourth-order. 

The graphics in figure 7 illustrate the scaling of spin integration errors for one 
particular particle, and the impact of Romberg iterations on PWC spin integration. 
Were we just lucky, or do we see similar behavior across the relevant phase-space? 
The next several figures address this question, illustrating the effect of Romberg 
quadratures on spin integration for particles covering a range of initial conditions in 
phase-space. 

For the sector bend, we examine spin integration errors across a RHIC dipole: 
a 9.45 m long magnet set to bend 200 GeV protons 2.225°, so with magnetic field 
strength B ^ 2.75 T. As a proxy for exact tracking, we settled on, after considerable 
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Figure 9. Absolute error in the computed spin across a RHIC dipole for 
a range of initial conditions in orbital phase space. Each of the graphics in 
this figure show results for the same orbital initial conditions and the same 
number of slices as the corresponding graphic in figure 8. The differences are 
that the integration here includes one Romberg step, and the error ranges are 
significantly lower. 


testing, one Romberg iteration applied to 128 slices of PWC spin integration. 
Furthermore, because a magneto-static dipole is translationally invariant in both Y 
and T, we examine how the spin integration errors vary with respect to just A, 

Py, and Pt. 

In figure 8, we show the absolute error in the computed spin using PWC spin 
integration with either four or eight slices (upper and lower rows, respectively). 
Figure 9 shows the corresponding results obtained after we apply one Romberg 
iteration. To help the reader make a proper comparison, we note further details in 
the captions of those two figures. 

A comparison of figures 8 and 9 suggests that applying Romberg quadratures 
to spin integration across a sector bend does indeed reduce the error throughout 
phase-space. But we are also struck by some of the features seen in figure 8: the 
pronounced valley that runs diagonally across the four graphics on the left; and 
the much flatter and somewhat larger errors in the presence of an energy deviation, 
seen in the two graphics on the right. 

To gain some understanding of these features, we examined a much larger range 
of phase space; a range that is, in fact un-physically large for an arc dipole in RHIC. 
Figures 10 and 11 show the results. Each of the one hundred small graphics in 
those two figures covers the same range of horizontal phase space: ±5 cm in A, 
and ±6mrad in Px. They are laid out in four matrices of twenty-five graphics each, 
where the columns correspond to five different values of P/, evenly spaced in the 
range ±2 x 10“^; and the rows correspond to five different values of P^, evenly 
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Figure 10. Absolute error in the computed spin using four slices across a 
RHIC dipole. Each small graphic covers a range of initial conditions in hori¬ 
zontal phase-space (see labels at lower left). The rows and columns correspond 
respectively to different initial values for Py and Pt (see labels along right and 
top edges). For the lower set of graphics, we added one Romberg iteration. 
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Figure 11 . Absolute error in the computed spin using eight slices across a 
RHIC dipole for a range of initial conditions. See also the caption of figure 10. 
The two sets of twenty-five graphics correspond to PWC spin integration 
(upper set) and PWC plus one Romberg iteration (lower set). 


spaced in the range ±0.2 mrad. (See the labels at the top and to the right of each 
matrix of graphics.) To obtain the graphics in figure 10, we used four slices of PWC 
spin integration for the upper set of twenty-five graphics, and four slices plus one 
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Romberg iteration for the lower set. We obtained the data shown in figure 11 using 
eight slices. In all the graphics, the gray-scale covers the same (logarithmic) range 
of errors: from black at 10 “^^ to white at 10 “^’^. 

The graphics shown in figures 10 and 11 reiterate the message that applying a 
Romberg step to the result of PWC spin integration can reduce the error throughout 
the phase-space of orbital initial conditions. In addition, however, we draw the 
reader’s attention to the yellow curve, overlaid on each graphic, which clearly follows 
the deep valley seen in those graphics for which no Romberg step was applied. In 
each graphic, this curve is the locus of points {X\ P^) for which the value of at 
the magnet exit equals the value P^ at the magnet entrance. This value is given by 



(45) 



Note that the dependence of on Py and Pt occurs only under the square-root, 
and that there it is linear in Pt and quadratic in Py. This difference (together with 
the fact that Py and Pt denote small quantities) explains why the yellow curves in 
figures 10 and 11 show little dependence on P^, but very sensitive dependence on 


Earlier, when examining figure 8 , we noted that the two right-hand graphics have 
much flatter and somewhat larger error profiles than appear in the left-hand four 
graphics. Now, on comparing with the graphics in figure 10, we see what happened: 
the valley simply moved “off stage” as we increased the energy deviation. 

For the quadrupole, we examine spin integration errors across a RHIC interaction 
region (IR) quadrupole: in this case a 1.83 m long magnet set to focus 200 GeV 
protons with a magnetic quadrupole gradient of 62 = 70.0 T m“^. As a proxy for 
exact tracking, we settled on, again after much testing, three Romberg iterations 
applied to 128 slices of PWC spin integration. 

A coarse examination over the range of initial conditions ±20 mm in X and T, 
±20prad in P^ and Py^ and ±0.35 x 10“^ in Pt indicate that (at least within this 
phase-space domain) spin integration errors change little with respect to variations 
in the transverse momenta Px and Py^ or the relative energy deviation Pt. As a 
consequence, here we show graphics of the spin integration errors for this element 
only in the (X, Y) plane, with P^ = Py = P^ = 0. 

In the case of the sector bend, the spin integration error does not appear to 
depend on the initial spin orientation. For the quadrupole, however, the accuracy of 
spin integration very definitely depends on the initial spin, so we have included that 
dependence in the results shown here. 

In figure 12 , we show the absolute error in the computed spin across a RHIC IR 
quadrupole. The first two rows show results obtained with PWC spin integration 
using either four slices (first row), or eight slices (second). The results shown in the 
lower two rows correspond to applying our Romberg procedure to the first two rows: 
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Figure 12. Absolute error in the computed spin across a RHIC IR quadrupole. 
The rows, from top to bottom, show results obtained using four slices, eight 
slices, four slices plus two Romberg steps, and eight slices plus three Romberg 
steps. The columns, from left to right, show results averaged over spins covering 
an opening angle about the vertical of 0°, 20°, and 180°. 
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four slices plus two Romberg steps (third row), or eight slices plus three Romberg 
steps (bottom row). As indicated above each graphic, the gray-scale in the first two 
rows covers the same (logarithmic) range of errors: from black at 10 “^ to white at 
10“^. In the third row, the range goes from 10“^^ to which means most of 

the errors in the third row (all except small triangles in the corners) he below the 
errors in the upper two rows. In the last row, the range of 10“^^ to means 

the errors here lie entirely below those of the upper two rows; and they lie below 
most of the errors in the third row. 

The columns in figure 12 correspond to averaging over various sets of initial 
spins. We generated a quasi-random distribution of spins (z.e., points on the unit 
sphere [44]) with opening angle a about the vertical: a = 0° (left column), 20° 
(middle), and 180° (right). 

In the case of a vertical spin {a = 0°), the errors vanish for orbital motion in the 
mid-plane because the vertical spin remains parallel to the vertical field everywhere 
along those orbits and therefore does not process. What about other particles, those 
launched still with a vertical initial spin, but away from the mid-plane? For particles 
traversing a RHIC IR quad, the Q of ( 6 b) lies predominantly along the quadrupole 
magnetic field, ( 62 T, 62 X, 0 ). Since the component of Q orthogonal to the spin is the 
component that rotates the spin, it follows that an initially vertical spin should be 
most sensitive to variations in T, and rather insensitive to variations in X. This 
suggests why the two graphics in the upper left of figure 12 —those for a = 0 ° and no 
Romberg iterations—show essentially no variation with respect to Xb Moreover, the 
fact that the orbital variation in Y is (to first order) proportional to F* suggests why 
the errors in those same two graphics increase away from the horizontal mid-plane. 

As we allow the distribution of initial spins to open up (a > 0 °), the distribution 
of errors (before the application of any Romberg iterations) quickly becomes more 
symmetric about the quadrupole axis. See the four graphics in the upper-right of 
figure 12 . After we apply one or more Romberg steps, the quadrupole spin-integration 
errors become smaller and develop additional structure across the (X, F) plane. 
See the bottom two rows of figure 12 . We do not yet understand the origin of this 
structure. 

For the solenoid, we examine spin integration errors across a 1.3 T, 2.1m long 
magnet. For the 200 GeV protons we used in previous elements, we find that four 
slices together with one or two levels of Romberg quadratures suffice to reduce errors 
to the level of round-off. In this case, however, the particle experiences very little 
spin rotation—of order 10 mrad. We therefore chose to look at the spin integration 
errors for a much lower energy proton, 1.2 GeV, traversing this same magnet. In this 
case the spin rotation is of order 0.6 rad. As a proxy for exact tracking, we again 
settled on three Romberg iterations applied to 128 slices of PWC spin integration. 

Given the homogeneous nature of a solenoid’s body magnetic field in our thin- 
fringe approximation, we expect spin integration errors across this element to exhibit 
very little dependence on X% F% or A coarse examination over a range of initial 
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Figure 13. Absolute error in the computed spin across a solenoid. We com¬ 
puted the data shown in these three graphics using, from left to right, eight 
slices PWC spin integration, four slices PWC plus two Romberg iterations, 
and eight slices PWC plus three Romberg iterations. Note the different error 
ranges indicated above each graphic: with each step across the page, the errors 
fall by several decades. 


conditions in phase space indicates that this is indeed the case. As a consequence, 
here we show graphics of the spin integration errors for this element only in the 
{Px,Py) plane. 

In figure 13, we show the absolute error in the computed spin for a 1.2 GeV 
proton traversing the solenoid described above. The top graphic shows the result 
obtained using eight slices of PWC spin integration. The two graphics below show 
results obtained using four slices plus two Romberg steps (middle), and eight slices 
plus three Romberg steps (bottom). As indicated by the ranges noted above each 
graphic, the error ranges in these three graphics do not overlap. Indeed, the error 
range drops by several decades with each step down the page. 

5.2.2. Single-turn errors. In this section, we examine spin integration errors after a 
full turn in the same RHIC lattice used for the orbital studies in section 5.1. See 
figures 14 and 15. As our measure of the error, we use the mean difference between 
the spin computed using a given set of numerical algorithms, and a reference solution 
that we use as a proxy for the exact result. The mean is taken over an ensemble of 
spins in a beam. We compute thus, at turn t, 

(46) AS{t) =-Y,\sAt) - ■ 

i=i 

For the results shown here, we computed the reference solution using BK/MK 
integration with = 256 and three Romberg iterations. 

Figure 14 shows the absolute spin error as a function of the number of spin 
slices, here chosen identical to the number of orbital slices. No Romberg steps were 
applied, except when computing the reference solution. As a consequence, differences 
seen here between drift-kick results (upper two curves) and BK/MK results (lower 
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Figure 14. Single-turn spin errors as a function of the number of slices 
for several different integrators: drift-kick (blue circles, tan diamonds), and 
BK/MK (purple squares). The blue circles represent DKSKD: traverse each 
slice using half-drift, half-kick, spin rotation, half-kick, half-drift. The purple 
squares represent DKDSDKD: traverse each slice using half-DKD, spin 
rotation, half-DKD. The reference solution used BK/MK with rib = 256 
together with three Romberg iterations. 


curve) derive solely from differences in the orbital data passed to the Thomas-BMT 
equation. 

At small numbers of orbital slices, spin integration based on drift-kick orbital 
integration reproduces only the one-turn spin-precession axis (close to the vertical). 
The precession angle about that axis, however, is completely off: compared to the 
reference solution, the phase of the spin precession is distributed uniformly over a 
27r interval. Increasing the number of orbital slices improves the accuracy of spin 
integration, and at rib = 64 the single-turn spin-phase errors fall below a mrad. 

When performing spin integration based on BK/MK orbital integration, we 
obtain errors that are consistently two-and-a-half decades below those based on 
drift-kick integration (see the lowest curve in figure 14). This result, we emphasize, 
obtains in the absence of Romberg quadrature. 

Figure 15 also shows the absolute spin error, but now as a function of particle 
emittance, after a single turn in the RHIC lattice. As in figure 14, no Romberg 
steps were applied—except in computing the reference solution—so that here also 
the differences seen derive solely from differences in the accuracy of the orbital 
integration. In addition, we have reduced the noise level in this graphic by, as in 
figure 5, averaging over narrow slices of emittance. 

5.2.3. Evolution of spin errors. We also examined how spin errors evolve over the 
course of many simulated turns in RHIC. In figure 16 we use the accumulated spin 
error to compare different methods of integration. The green curve shows results 
obtained using the previous standard: a TEAPOT split with four slices for most 
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Figure 15. Amplitude dependence of the single-turn spin errors for three 
different integrators: drift-kick with = 32 (upper), and BK/MK with = 1 
(middle), and = 4 (lower). 



Figure 16. Accumulated spin error as a function of turn number, for multi¬ 
turn tracking in a RHIC lattice, using different methods of integration. From 
top to bottom: drift-kick integration with = 8 and no tune-fit (tan); the 
previous standard, a TEAPOT split with = 4 (green); BK/MK with 
= 4 and no Romberg (blue); and BK/MK with = 4 (purple). The 
simulated beam had a 95% emittance of 157rmmmrad. 


elements, but 16 slices for the strong elements around the interaction regions. The 
lowest two curves we obtained using the new integrators, also with = 4. For the 
blue curve, we used just the “trapezoidal” rule (PWC of figure 7). For the lowest 
curve, we computed a Romberg limit using a maximum k of 3. 

The topmost curve, obtained using drift-kick integration with nij = 8, but without 
retuning the lattice, exhibits pronounced oscillations. The tune errors in the orbital 
integration translate into these periodic oscillations of the spin errors. 
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Figure 17. Time required to track N particles through 100 turns of the RHIC 
lattice with GPU-accelerated integrators (solid). The CPU time (dashed) is 
an estimate; see text for details. 

6. GPU-accelerated implementation 

In recent years, graphics processing units (GPUs) have emerged as an effec¬ 
tive means to accelerate compute-intensive workloads. Because they are designed 
for massively data-parallel workloads with very regular control flow and memory 
access patterns, GPUs can dedicate a much larger fraction of their transistors 
to processing elements (registers and arithmetic and logic units) than can CPUs. 
As a consequence, they can achieve significantly larger computational throughput, 
performance per watt, and performance per dollar for workloads that are well-suited 
to their architecture. Particle tracking in the absence of space-charge effects is an 
embarrassingly parallel problem, and is therefore a natural fit for GPUs. 

To map the spin-orbit tracking problem onto GPU hardware, we assign a GPU 
thread to each particle. For the actual tracking, we first transfer the state of a bunch 
to global memory on the GPU. This state comprises six-dimensional orbit data, 
three-dimensional spin data, plus the quaternions required for accumulating spin 
rotations. Particle tracking through individual beamline elements is broken up into 
several GPU kernels. We first compute the orbital motion through a particular 
beamline element. This requires loading orbit data at the element entrance from 
global memory into registers, computing the orbital motion using the integrators 
discussed in detail above, and then, at the element exit, writing the updated state 
back to global memory. During the orbital computation, we record in global memory 
the spin precession vector U at the edge of each orbital slice. After integrating the 
orbital data through the given element, we then use the recorded U data to update 
the quaternions using PWC spin integration plus one or more Romberg steps. We 
then update the spin. 

The particle data resides on the GPU throughout spin-orbit tracking. By main¬ 
taining the data on the GPU, we avoid unnecessary data transfers between GPU and 
CPU, which can significantly slow down simulations. Particle data is occasionally 
copied back to the CPU in order to perform analysis; e.g., to compute (or update) 
the invariant spin field (ISF) at a particular location in the accelerator. 
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Figure 17 shows a comparison of the performance of spin-orbit tracking on a 
GPU and a CPU. The GPU simulations were carried out on the Dirac cluster at 
the National Energy Research Scientific Computing center (NERSC). The CPU 
performance is an estimate for an 8-core CPU, based on the assumption that the 
simulation scales perfectly to 8 CPU cores and that each CPU core is a factor 4 
faster than a GPU core. For up to a few hundred particles the CPU is faster than 
the GPU. For larger numbers of particles, the GPU accelerated simulations are 
faster. For a few ten-thousands of particles, the GPU accelerated simulations are 
approximately 15 times faster than the corresponding CPU implementation. 


7. Conclusion 

In section 5.1, we showed that the BK/MK integrators can be much more 
efficient than the drift-kick integrators. This, of course, is well known (but see 
Forest’s discussion [23] for why drift-kick integration remains useful). Our purpose 
in including such a comparison here has to do with the influence of accurate orbit 
integration on the acccuracy of spin integration. 

Figure 14 tells us that better orbit data yields more accurate spin integration. 
Using BK/MK integration yields a two-decade reduction of the spin errors as 
compared to drift-kick integration. We see that the improvement with the number 
of orbital slices is, in large part, due to the improvement in the orbital accuracy. In 
other words, with drift-kick integration the accuracy of the spin motion will improve 
just by increasing the number of orbital slices while holding the number of spin 
integration steps fixed. 

On the other hand, for BK/MK integration, the discretization of the spin motion 
appears to be a significant source of error: A comparison of figures 4 and 5 with 
figures 14 and 15 show that the relative error in the orbital motion is much smaller 
than that in the spin rotation angle. 

In section 4.2 we discussed interpolating the spin precession vectors, using cubic 
Bezier polynomials, as a means of rapidly computing multiple spin steps per orbital 
slice. To compute a Bezier curve for U, one must first determine the associated 
control points (see figure 1), and doing so requires computing the longitudinal 
derivative of U at the beginning and end of each slice. For most elements, this is a 
non-trivial, hence time-consuming, computation. In other words, if we wish to enjoy 
the fast computation of many Us along an interpolating Bezier curve, we must first 
invest some time computing U'. As a trade-off, this means we benefit only if we 
plan to interpolate U at some (possibly large) number of intermediate locations. 

In the bargain, however, we do not improve the rate at which spin integration 
converges: The accuracy of this approach cannot improve upon what we obtain by 
taking smaller steps (z.e., using more orbital slices). In other words, the best we 
might hope for from interpolating the precession vector U, is a faster computation 
of PWC spin integration, which exhibits just second-order convergence. 
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As shown in figure 7, as well as in figures 8 through 13, the application of Romberg 
quadratures to PWC spin integration does increase the rate of convergence. Moreover, 
this approach requires only a modest number of orbital slices. Applying the Romberg 
algorithm to a sequence of quaternion products constitutes an easy and fast means 
of accelerating the convergence of spin integration. As a consequence, we have not 
pursued further the use of Bezier curves to speed up the computation of 

Note that our initial motivation for using quaternions was the factor of two 
savings in arithmetic they convey. After discovering that Romberg applied to 
quaternions accelerates the convergence of the spin motion, we have another—much 
more significant—reason to use quaternions when integrating the Thomas-BMT 
equation for spin motion. 

Concerning the convergence of spin motion, we here comment on what happens 
when simulating a system near a spin-orbit resonance [9]. First, note that in such 
a system a particle’s spin orientation becomes very sensitive to even very tiny 
perturbations. As a consequence, if we follow a single spin and ask for it’s motion 
to converge as we slice the lattice ever more finely, we may never see that happen. 
As with the orbital motion, we do not—cannot—ask for the “exact result”, because 
the machine we build is not the one we designed (though we hope it is close!). We 
instead ask for accurate qualitative dynamics, as revealed by phase-space portraits. 

Second, note that the integrators are particular to given elements. In other words, 
they care only about the mapping of phase-space coordinates across a single element: 
z'^ at entrance ^ at exit. On the other hand, a resonance condition can exist 
only in a ring: it is defined by the ring. One might therefore expect that the choice 
of integrator is, in some sense, independent of the proximity and strength of a 
resonance. But this expectation can hold only if the integrators are independent of, 
or have no effect on, whatever properties define the resonance. 

When using drift-kick integrators, we know the tune seen in tracking data depends 
on how finely we slice our elements. What happens, then, as we approach a resonance? 
If we increase the slicing so as to ensure convergence, the effective tune changes. 
Hence the distance to resonance changes, and the spin dynamics can appear quite 
different—especially because of its sensitivity in the neighborhood of a resonance. If, 
on increasing the slicing we also re-tune the lattice, then the distance to resonance, 
as defined by the orbital tunes, remains the same. But the quadrupole strengths do 
change, and this implies a change in the spin tune for a given simulated particle [9]. 
And so again the distance to resonance changes, and again the spin dynamics can 
appear quite different because of its sensitivity in the neighborhood of a resonance. 

Because BK/MK integrators yield the same linear orbital tunes independently 
of slicing, they are not subject to the same difficulties. 

In addition to the improved accuracy and accelerated convergence of spin tracking, 
our code benefits from the significant speed-up it derives just from its implementation 
on GPUs. As shown in figure 17, a reasonable estimate for that benefit yields a speed¬ 
up factor of about 15. On a more practical level, we note that on one of NVIDIA’S 
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GeForce TITAN GPU nodes, computing an ISF for RHIC at 10^ phase-space 
locations using 10^ turns takes less than fifteen minutes. 

Finally, we are currently implementing integrators for other types of optical 
elements. These include electrostatic lenses, which are important for simulating 
EDM experiments, as well as better models for Siberian snakes. 
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